A Multi-Dimensional Evaluation of Synthetic Data Generators
نویسندگان
چکیده
Synthetic datasets are gradually emerging as solutions for data sharing. Multiple synthetic generators have been introduced in the last decade fueled by advancement machine learning and increased demand fast inclusive sharing, yet their utility is not well understood. Prior research tried to compare of using different evaluation metrics. These metrics found generate conflicting conclusions making direct comparison very difficult. This paper identifies four criteria (or dimensions) masked classifying available into categories based on measure they attempt preserve: attribute fidelity, bivariate population application fidelity. A representative metric from each category chosen popularity consistency, used overall recent synthesizers across 19 sizes feature counts. The also examines correlations between selected an streamline utility.
منابع مشابه
Synthetic Generators for Cloning Social Network Data
Synthetic social network generators are useful for a variety of purposes, including benchmarking algorithms, modeling human interactions within agent-based simulations, and debugging code. Despite the increased availability of social media data, collecting data directly from these networks is not always feasible due to privacy concerns. Often data access is restricted to “silos” of analysts wit...
متن کاملRe-Identification and Synthetic Data Generators: A Case Study
Synthetic generators are increasingly used to replace sensitive data with artificial data preserving to a predetermined extent the utility of the original data. When using synthetic data generators, re-identification analysis is usually disregarded on the grounds that, the released data being artificial, no real re-identification is possible. While this may be reasonable if synthetic generation...
متن کاملDistributed Searching of Multi-dimensional Data: A Performance Evaluation Study
In this paper we present a data structure for searching in multi-dimensional point sets in distributed environments and discuss its experimental evaluation also through a comparison with previous proposals. The data structure is based on an extension of k-d trees. The technological reference context is a distributed environment where multicast (i.e., restricted broadcast) is allowed, but it is ...
متن کاملA method for 2-dimensional inversion of gravity data
Applying 2D algorithms for inverting the potential field data is more useful and efficient than their 3D counterparts, whenever the geologic situation permits. This is because the computation time is less and modeling the subsurface is easier. In this paper we present a 2D inversion algorithm for interpreting gravity data by employing a set of constraints including minimum distance, smoothness,...
متن کاملVisualizing Multi-Dimensional Data
High dimensional data visualization is very important in data analysts since it gives a direct and natural view of data. In this paper, we propose a method to visualize large amount of high dimensional data in a 3-D space. In our method, we divide the high dimension data into several groups of lower dimensional data first. Then, we use different icons to represent different groups. Initial expe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2022
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2022.3144765